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Statistical  problems  in  general  require  extensive  use  of  optimizing 
techniques.  A variety  of  these  methods,  depending  on  the  nature  of  the 
problem,  have  been  used  by  statisticians.  In  recognition  of  the  importance 
of  optimizing  methods  in  statistics,  tvo  research  conferences  in  this  area 
were  recently  held,  and  the  proceedings  of  the  conferences  exhibit  the  wide 
variety  of  statistical  applications  in  which  an  important  part  is  played  by 
optimization,  Rustagi  (1971,  1979). 

Optimizing  methods  have  been  classified  in  four  main  categories: 
classical  optimizing  methods,  mathematical  programming  methods,  numerical 
methods  and  variational  methods.  A review  of  these  methods  was  recently 
given  in  a paper  by  Rustagi  (1978a).  Some  recent  applications  of  optimization 
in  statistics  appear  in  a special  issue  of  Communications  in  Statistics 
edited  by  Rustagi  (1978a).  Survey  of  some  of  the  commonly  used  variational 
methods  with  their  applications  in  statistics  has  also  been  given  in  a book, 
Rustagi  (1976). 

In  this  paper,  some  recent  applications  of  variational  techniques  are 
given.  Examples  are  provided  from  robustness  studies,  decision  theory  and 
estimation  of  probability  densities. 

Under  variational  methods  we  include  all  the  techniques  which  are 
required  to  optimize  a functional  over  a function  space.  In  its  simplest 
form,  a variational  problem  results  if  one  wants  to  optimize  an  integral  of 
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a known  function  of  an  unknown  function  and  possibly  of  its  derivative. 

In  a sense  variational  methods  correspond  to  methods  of  maxima  and  minima 
in  calculus  to  similar  methods  in  functional  analysis. 

Some  of  the  early  results  in  robustness  studies  required  optimizing 
the  variance  of  M-estimates  introduced  by  Huber  (1972)  over  the  class  of 
symmetric  distributions.  Such  a criterion  can  also  be  stated  in  terms  of 
minimizing  Fisher's  information  and  explicit  solutions  of  these  problems 
require  variational  methods.  Recent  extensions  of  the  applications  of 
variational  techniques  to  dependent  situations  has  been  diacussed  by  Portnoy 
(1977)  and  the  method  of  geometry  of  moment  spaces  has  been  utilized  by 
Collins  and  Portnoy  (1979)  for  more  general  situations.  This  problem  is 
discussed  in  section  2. 

Questions  of  admissibility  of  certain  decision  problems  arise  in  many 
contexts.  Brown  (1971)  has  recently  studied  the  admissibility  of  certain 
decision  functions  in  the  multivariate -normal  case  under  quadratic  loss 
criterion.  These  questions  lead  naturally  to  the  solutions  of  the  variational 
problems.  Using  classical  theory  of  calculus  of  variations  and  Euler - 
Lagrange  equations,  the  inadmissibility  of  the  decision  function  was 
exhibited  with  the  help  of  nonexistence  of  the  solution  of  an  optimization 
problem.  This  novel  application  of  a variational  technique  is  especially 
illuminating  as  it  provides  a method  of  verifying  admissibility  under  fairly 
general  conditions.  We  discuss  further  details  in  section  3. 

Estimation  of  densities  utilizing  penalized  maximum  likelihood  methods, 
has  been  discussed  by  Good  (1971),  and  Good  and  Gaskins  (1971).  A general 
formulation  of  the  optimization  problem  arising  from  the  above  in  abstract 
setting  has  been  given  by  DeMontricher,  Tapia  and  Thompson  (1975).  The 
existence  of  the  solution  of  the  proposed  optimization  problems  has  been  proved 
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and  certain  characterizations  of  the  optimum  solution  are  given.  This  topic 
is  discussed  in  Section  4. 


2.  Robust  statistics  and  variational  techniques 


There  are  several  situations  where  variational  techniques  play  an 
important  role  in  the  study  of  robust  statistics.  In  this  section  we 
consider  only  two  such  examples.  The  first  example  is  due  to  Huber  (1972), 
Portnoy  (1977)  and  Portnoy  and  Collins  (1979) > and  the  second  is  due  to 
Bickel  (1965). 

In  Huber's  notation,  we  consider  the  M-estimates  of  a location  parameter 
e for  the  probability  density  f(x-e),  with  c.d.f.  F(x-e)  based  on  a random 
sample  XL,X0j # . }xn*  Tn  is  called  an  M-estimate  for  9 if  it  maximizes 

L „(*!  - Tn)  , 

i=*l 

where  q is  a given  metric.  M-estimates  are  also  obtained  if  we  solve  the 
following  type  of  equation 

n 

E f(x.  - T ) = 0 
i-1  1 n 


where  ^ = p'.  M-estimates  include  least-squares  and  maximum  likelihood 

2 f' (x) 

estimates  as  examples,  if  we  choose  o(x)  = -x  and  q(x)  = - fyxj 
respectively. 

Under  fairly  general  conditions,  it  is  known  that 


T = T(F  ) -*  T(F)  a.s.  as  u ^ , 

n n 

such  that 

f ^ (x-T(f) ) F(dx)  = 0 . 
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Further,  the  asymptotic  distribution  of 


^(Tn  - T(F)) 

is  normal  vith  mean  0 and  variance  V($,F)  given  by 

V(*,F)  = f [ J i-(x---TlFU ] F(dx)  = . (?.l) 

J J *' (*  - T(F))  F(dx)  E(t'2) 

One  of  the  important  problems  in  robust  estimation  is  to  find  the  class  of 
statistics  tTn)  such  that  asymptotic  variance  ia  minimized  over  the  class  <? 
of  all  distribution  functions  F. 

Robust  estimates  can  also  be  generated  by  using  linear  combinations  of 
order  statistics  and  by  the  use  of  statistics  derived  from  rank  tests.  These 
estimates  are  called  L-  and  R-estimates,  respectively.  Consider  T^  as  a 
functional  of  the  empirical  distribution  function  Fr  given  by 

T = T ( F ) . 
n n 

Then  M -estimates  ore  defined  by  the  formula 

J *(x  - T(F))  F(dx)  = 0 . (2.2) 

L -estimates  are  given  by 

T(F)  = r J(t)  F_1(t)dt  (2.3) 

and  R-estimates  are  defined  by 

j J | |[F(x)  ♦ 1 « F( 2T(F)  - x ) ] j F(ftx)  = 0 

where  the  function  J gives  weights  in  linear  combination  of  order  statistics. 
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Restricting  the  estimates  to  the  translation  invariant  class  and  assuming 
that  the  distribution  function  is  symmetric,  Huber  (196U)  showed  that  the  distri- 
bution which  minimizes  Fisher  information  plays  a major  role  in  determining 
robust  estimates. 

Consider  for  example  the  class  of  all  t -contaminated  normal  distributions 
denoted  by  C-  where  an  element  of  the  class  0 is  denoted  by 

F(x)  = (1  - «)  *(x)  + ( H(x),  0 < c < 1, 

( is  known,  I is  the  standard  normal  cumulative  distribution  function  and  H(x) 

is  a symmetric  distribution  function.  Define  1(F)  = sup[V(^,F))  \ with 

* 

f ^?cLF  / 0.  Then,  Huber  (196U)  proved  the  following  theorems. 

Theorem  1.  1(F)  < • if  and  only  if  F has  absolutely  continuous  density  f(x) 

such  that  j f(x)  d*  < *>  and  then 

I(F)  = I (£nrr)2f(x)  dx- 


Theorem  2.  If  inf  l(F)  = a < •,  then  there  exists  a unique  FQ  c C such  that 
F tC 

i(r0)  . .. 


Using  variational  techniques  and  some  guessvork,  F0  and  the  corresponding 
can  be  found.  For  example  in  the  above  case  we  have  the  least  favorable 
density  fQ(x ) given  by 


fQ(x) 


1-c 


-p0U) 


5 


Jx  | < k , 

2 

- I > |x)  > k . 

- 2f( -k)  . 

Similar  results  hold  for  L-  and  R -estimates . 

Robustness  questions  related  to  dependent  situations,  have  been  recently- 
studied  by  Portnoy  (1977)  and  Collins  and  Portnoy  (1979).  The  variational 
problems  arising  in  these  applications  require  modern  methods  such  as  those  of 
geometry  of  moment  spaces,  for  reference,  see  Rustavi  (1976).  Consider  the 
time -series  moving  average  model. 


vhere  oQ(x)  =' 


I*  “ 


k 1x| 


k is  chosen  so  that 


* " I 


where  (i) 

(ii) 

(iii) 

(iv) 

(v) 


xi  - 6 + Yi  + > Vi + 0 Vi  * 

i = 1,2, ... ,n 


Y_=  Y , Y-  = Y , 
0 n 1 n+1 


6 =>  location  parameter 

X^,...,Xn  have  a stationary  distribution 

IpI  < 1 

Yi,Y2, ...,Yn  are  independently  and  identically  distributed 
random  variables  having  continuous  and  symmetric  c.d.f.  G 
with  p.d.f . g(y) . 

The  approximate  asymptotic  variance  of  the  estimates  of  9 is  given  by 

vm)  . -iStfsi))  . k,  mm . 0(,?). 
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Variational  methods  are  used  to  find  which  minimizes  the  variance  for  a 

fixed  g and  turns  out  to  be  the  same  as  in  the  independent  case. 

The  second  example  in  the  study  of  robust  estimates  of  location  is  due  to 

Bickel  (1965).  Variational  problems  occur  naturally  in  finding  minimum 

efficiency.  In  this  paper,  Bickel  considers  minimum  efficiency  with  respect 

to  the  class  of  all  symmetric  and  symmetric  unimodal  distributions,  of  the 

Winsorized  and  trimmed  means  with  respect  to  the  mean. 

Suppose  W,  < W„  < ...  < W are  the  ordered  statistic  of  a sample 
1 £.  n 

Xi,...,X  from  an  absolutely  continuous  distribution  function  F(x).  Then 
rr-triraraed  mean  is  defined  by 

n-[on] 

z w. 

1 

7 _ i=[cm+l] 

Q n - 2[onl 

where  [cm]  is  the  greatest  Integer  in  cm,  0 < a < 

o-Winsorlzed  mean  is  defined  by 

* 1 f n-[c<n] 

X,  = - i[an]  W + E W,  + [cm]  V \ 

n [cm]  i=[cm]«-l  n-[  cm  ’*+1  J 


bet  (o)  and  e (c»)  be  the  asymptotic  relative  efficiencies  of  X and  X 
12  o c 

with  respect  to  X respectively.  Then 

• X 

e..(a)  = (l-2a)2  ( f x2f(x)dx)  ( [ x?f(x)dx 

1 b-  -X 

+ ? coc(a)2)  "1 

m 

^ x‘f^x)dx 


and 


"*(a)  ~ C + + ?)2 
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'mfiwr-i'-iifii 


where 

(1)  x(a)  - -X 

(p)  f(x(a) ) = k 

X 

(3)  f x2f(x)dx  = e 

-X 


Let  > be  the  class  of  all  symmetric  uniinodal  distribution  functions. 
Then  the  following  theorem  is  proved. 


Theorem: 


inf 

F « 1 


M*) = mis 


inf 

F * 1 


Using  Lagrange's  method  of  undetermined  multiples,  Euler -Lagrange  equations 
of  calculus  of  variations  provide  the  minimizing  densities  f(x).  Detailed 
proofs  are  in  Bickel  (1965). 


3.  Admissibility  questions  and  variational  methods 

Necessary  and  sufficient  conditions  for  an  estimator  of  the  mean  of  a 
multivariate  normal  distribution  under  squared  loss  function,  to  be  admissible 
have  been  discissed  by  Brown  (1971).  The  problem  of  admissibility  is  directly 
related  to  problems  of  diffusion.  This  correspondence  is  established  through 
classiceJ-  variational  techniques  using  Euler -Lagrange  equation. 

Let  p (x)  be  the  m-dimensional  multivariate  normal  density  of  the  random 

U 

vector  X.  Let  6(x)  denote  an  estimate  of  0.  Suppose  the  loss  function  is 
given  by 


L(e,»)  - (6  - y)'D(*  - tf) 

where  D is  known  diagonal  matrix.  We  have  the  following  notation: 
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R(f>,6)  = E0tL(f),ft(x))] 

G(e)  = Prior  distribution  function  of  8. 

B(G,«)  = f R ( fl , • ) G(de), 

= Bayes  risk. 


It  is  well-known  that  the  Bayes  estimator  is  given  by 

f ©p.(x)  G(de) 

« (x)  = L-£ . 

J P0(x)  G(dG) 

Let  !!y|l2  = y'Dy*  When  D = I,  !!y||L  = jy|2  = T.  y 2 . Suppose 

i=l 


and 


Then 


g*(x)  = J P0(x)  G(de) 

,6*W  = (Ixf  ’ Iff  *•••»  lx“) 


(3.1) 


Define 


*G(x) 


g*(x 


(3-?) 


Yq(x)  = *G(x)  - X . 


(3-3) 


The  necessary  and  sufficient  conditions  for  admissibility  were  given  by 
Stein  (1955).  One  of  the  conditions  for  an  estimator  ft  (x)  with  prior  F 

r 

to  be  admissible  is  given  by  the  following  sufficient  condition. 


Stein's  condition:  ftT,  is  admissible  only  if  there  exist  non-negative 

■ I* 

finite  Borel  measures  G^,  i = 1,?,..,  with  G^  having  compact  support  with 
G^({0})  = 1 such  that 


B(Gi,*F)  - BtG^d  ) -*  0 


(3.M 


&S  i 4 ■. 
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Condition  (3*^)  can  be  written  in  the  following  form  if  we  define  f and  f* 
related  to  c.d.f.  F,  in  the  same  way  as  we  defined  g and  g*  related  to 
c.d.f.  G. 


E(  W -'B<0i-‘o1>  • I «i*,*>d* 

= J !!^i(x)||2  f*(x)dx  = l(Ji(x)) 

/g  *(x)v| 

Ji^  = (f^Ixl)  * 

The  condition  of  admissibility  is  then  reduced  to  the  problem  of  minimizing 


where 


Kj(x)) 


(3-5) 


subject  to  the  constraints 

(i)  j(x)  > 1,  |x|  < 1 

(ii)  lim  sup  j(x)  = 0 . 

V-*-  |x  |=r 

Euler -Lagrange  equation  for  (3.5)  given  by 

f*(x)  " ™ a-f*(x)-  = 0 

i=l  dx.  i=l  dx.  dx. 

i i 1 

This  is  an  elliptic  partial  differential  equation  and  its  solution  provides 
an  answer  to  the  admissibility  question  posed  above.  Brown  has  used  elaborate 
machinery  to  show  that  the  solution  to  the  elliptic  differential  equation  exiits 
for  ]x  | > 1 which  shows  the  inadmissibility  of  the  usual  Bayes  estimator  of 
the  multivariate  normal  mean  6 for  m > ?. 
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4.  Variational  methods  and  penalized  maximum  likelihood  estimates 


A method  of  estimating  probability  densities  utilizing  penalty  functions 
was  introduced  by  Good  (1971)  and  was  developed  further  by  Good  and  Gaskins 
(1971).  To  remove  roughness  in  estimating  the  probability  density  functions. 
Good  and  Gaskins  require  maximizing  not  the  log  likelihood  but  maximizing 
log  likelihood  adjusted  by  a known  function  of  the  density  function.  The 
optimization  problems  so  introduced  lead  naturally  to  variational  problems. 
Many  such  problems  in  their  abstract  form  have  recently  been  studied  by 
DeMontricher,  Tapia  and  Thompson  (1975). 

Given  that  X^,X2, ...,Xn  is  a random  sample  of  size  n from  an  unknown 
density  function  f(x),  the  penalized  maximum  likelihood  estimates  of  f are 
defined  by  maximizing 

L(f)  = TT  f(x. ) e"#(f)  (4.1) 

i=l  1 

subject  to  the  constraints 

J f(x)dx  = 1 (4.2) 

and  f(x)  > 0 . (4.3 ) 

In  a little  more  abstract  form,  the  problem  is  formulated  in  terms  of  the 
following  notation . Let 

0 = interval  (a,  b)  , 

L'(fj)  = class  of  Lebesgue  integrable  functions  and  f e L'(n) 

H(0)  = manifold  in  L'(n) 

*:  H(n)  -*  R . 
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The  variational  problem  is  maximizing  (4.1)  over  a class  H (fl)  subject  to 
constraints  (4.2)  and  (4.3)  for  all  x e 0.  The  existence  and  uniqueness 
of  the  maximizing  f is  given  by  the  following  theorem  due  to  DeMontricher, 

Tapia  and  Thompson, 

Theorem  1.  Suppose  H(0)  is  a reproducing  Kernel  Hilbert  space,  and  integration 
over  n is  a continuous  functional  and  there  exists  at  least  one  f with 

f ( x ) > O,  J fdx  = 1 and  f(x±)  > 0 , 

i = 1,2, ...,n  for  all  x « n. 

Then  the  maximum  penalized  estimate  corresponding  to  H (G)  exists  and  is  unique. 

Under  certain  additional  assumptions,  the  solution  of  the  above  problem 
can  be  characterized  as  a polynomial  spline.  Motivated  by  information  theoretic 
considerations,  Good  and  Gaskins  considered  the  first  penalized  maximum  likeli- 
hood estimate  of  the  density  function  by  using 

,(f)  ' ° I Trtr  dt'  ° > 0 

= 4a  r dt  . (4.4  / 

at 

Assume  that  H(fl)  is  such  that 

Jf  * H’ (-»,•)  . 

The  functional  to  be  optimized  is  still 

TT  f(x  ) e"4(f)  (4.5) 

i^l 

Suppose  u = ,/f,  then  the  optimization  problem  above  is  of  the 
following  form 


1? 


Max 


n o 

TI  u (x, ) 
i=l  1 


-4a  j u' (t)2dt 

e 


subject  to  the  constraints 


u c H’(  ji,«) 

and  J u2(t)dt  = 1 . (4.6) 

The  authors  show  that  the  first  maximum  likelihood  penalized  estimate  of 
Good  and  Gaskins  exists  and  is  unique. 

The  second  maximum  likelihood  penalized  estimator  is  defined  with  help  of 

m 

•(f)  - a J f’(t)2dt  + p J f,,2(t)dt  (4.7) 

for  some  a > 0 and  S > 0. 

Although  in  this  case  also,  one  can  show  that  the  estimate  exists  and  is 
unique,  it  is  not  possible  to  obtain  the  estimate  by  an  approach  provided  by 
Good  and  Gaskins. 

5.  Comments 

The  wide  variety  of  applications  of  variational  techniques  exemplified 
above  by  various  examples,  exhibits  their  importance  as  a necessary  tool  for  a 
statistician.  Once  the  problem  can  be  formulated  in  the  form  in  which  its 
variational  character  is  apparent,  there  are  many  available  techniques  to 
solve  it.  There  are,  however,  a large  class  of  problems  which  need  further 
study.  Consider  the  problem  of  feedback  control  where  the  equations  governing 
the  motion  of  a particle  are  not  known.  Suppose  these  equations  sure  estimated 
from  data.  The  dynamic  programming  solution  to  such  a feedback  problem  requires 
a different  approach  and  the  statistical  dynamic  programming  solution  then 
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naturally  leads  to  open  questions.  Distributions  and  stochastic  convergence 
of  the  solution  are  now  needed  and  interpretation  of  the  optimal  policy  is 
required  in  view  of  the  estimated  relations. 
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